SEARCH TREES: METRIC ASPECTS 
AND STRONG LIMIT THEOREMS 



RUDOLF GRUBEL 

Abstract. We consider random binary trees that appear as the output of 
certain standard algorithms for sorting and searching if the input is random. 
We introduce the subtree size metric on search trees and show that the resulting 
metric spaces converge with probability 1. This is then used to obtain almost 
sure convergence for various tree functionals, together with representations of 
the respective limit random variables as functions of the limit tree. 



1. Introduction 

A sequential algorithm transforms an input sequence t±, . . . into an output se- 
quence Xi, X2, ■ ■ ■ where, for all n e N, x n +\ depends on x n and t n+ i only. Typically, 
the output variables are elements of some combinatorial family F, each xgF has a 
size parameter 4>{x) S N, and x n is an element of the set F„ := {x £ F : 4>{x) = n} 
of objects of size n. In the probabilistic analysis of such algorithms one starts with 
a stochastic model for the input sequence and one is interested in certain aspects 
of the output sequence. The standard input model assumes that the U's are the 
values of a sequence 771 , 772 , . . . of independent and identically distributed random 
variables. For random input of this type the output sequence then is the path of a 
Markov chain X = (A n )„ S N that is adapted to the family F in the sense that 

(1) P(X n e F„) = 1 for all n e N. 

Clearly, X is highly transient — no state can be visited twice. 

The special case we are interested in, and which we will use to demonstrate an 
approach that is generally applicable in the situation described above, is that of 
binary search trees and two standard algorithms, known by their acronyms BST 
(binary search tree) and DST (digital search tree). These are discussed in detail in 
the many excellent texts in this area, for example in |Knu73| , |Mah92j and |Drm09j . 
Various functionals of the search trees, such as the height |Dev86] . the path length 
[Re g89] [EMI], th e node depth pro file [CDJHOT] jFHN06j . the subtre e size pro- 
file jFuc08j |DG10j . the Wiener index |Nei02j . and the silhouette |Gru09| have been 
studied, with methods spanning the wide range from generatingfunctionology to 
martingale methods to contraction arguments on metric spaces of probability dis- 
tributions (neither of these lists is complete). Many of the results arc asymptotic 
in nature, where the convergence obtained as n — > 00 may refer to the distribu- 
tions or to the random variables themselves. As far as strong limit theorems are 
concerned, a significant step towards a unifying approach was made in the recent 
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paper [EGW12] , where methods from discrete potential theory were used to obtain 
limit results on the level of the combinatorial structures themselves: In a suitable 
extension of the state space F, the random variables X n converge almost surely as 
n — > oo, and the limit generates the tail a- field of the Markov chain. The results 
in |EGW12] cover a wide variety of structures, search trees are a special case. It 
should also be mentioned here that the use of boundary theory has a venerable 
tradition in connection with random walks; see |KV83j and jWoeOOj . 

Our aims in the present paper are the following. First, we use the algorithmic 
background for a direct proof of the convergence of the BST variables X n , as 
n — > oo, to a limit object Xoa, and we obtain a representation of Xoo in terms of 
the input sequence (rji)i^. Second, we introduce the subtree size metric on finite 
binary trees. This leads to a reinterpretation of the above convergence in terms of 
metric trees. We also introduce a family of weighted variants of this metric, with 
parameter p > 1. and then identify the critical value po with the property that 
the metric trees converge for p < po and do not converge if p > po. The value 
po turns out to also be the threshold for compactness of the limit tree. Third, 
we use convergence at the tree level to (rc)obtain strong limit theorems for three 
tree functionals — the path length, the Wiener index, and a metric version of the 
silhouette. 

These topics are treated in the next three sections, where each has its own 
introductory remarks. In the last section we collect some general comments and 
also mention some open questions. 

2. Binary search trees 

We first introduce some notation, mostly specific to binary trees, then discuss 
the two search algorithms and the associated Markov chains, and finally recall the 
results from [EGW12] related to these structures, including an alternative proof of 
the main limit theorem. 

2.1. Some notation. We write C(X) for the distribution of a random variable 
X and C(X\Y — k), C(X\Y), CiXlF) for the various versions of the conditional 
distribution of X given (the value of) a random variable Y or a a-field J- . Further, 
5 C is the one-point mass at c, I a is the indicator function of the set A (so that 
1a(c) = 5 C (A)), Bin(n,p) denotes the binomial distribution with parameters n G N 
and p 6 (0,1), Beta(a,/3) is the beta distribution with parameters a, /? > 0, and 
unif(0, 1) = Beta(l, 1) is the uniform distribution on the unit interval. We also 
write unif(M) = (#M) _1 JZceM ^ c ^ or ^ ne uniform distribution on a finite set M. 
Let 

V fe :={0,l} fc , V:= □ V fcj dV:={0,l}°° 

fcSNo 

be the set of 0-1 sequences of length k, k £ No, the set of all finite 0-1 sequences, 
and the set of all infinite 0-1 sequences respectively. The set Vo has 0, the 'empty 
sequence' as its only element, and \u\ is the length of u € V, i.e. \u\ = k if u £ Vfc. 
For each node u = (ui, . . . , Uk) € V we use 

uO := (m, . . . ,u fc ,0), 
ul := (ui, ...,Uk, 1), 
u := (ui, . . . , Uk-i), if fc > 1, 
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to denote its left and right direct descendant (child) and its direct ancestor (parent). 
We write u < vforu= (ui , . . . , Uk) 6 V, v = {v\ , . . . , vi) £ V if k < I and Uj = Vj for 
j = 1, . . . , k, i.e. if u is a prefix of v; the extension to v E dY is obvious. The prefix 
order is a partial order only, but there exists a unique minimum u A v to any two 
nodes their last common ancestor; again, this can be extended to elements 

of dV. Another ordering on V can be obtained via the function (3 : V — > [0, 1], 



This will be useful in various proofs, and also in connection with illustrations. 

By a binary tree we mean a subset x of the set V of nodes that is prefix stable 
in the sense that u E x and v < u implies that v E x. Informally, we regard the 
components u\, . . . , Uk of u as a routing instruction leading to the vertex u, where 
means a move to the left, 1 a move to the right, and the empty sequence is the 
root node. The edges of the tree x are the pairs (u,u), u E x, u ^ 0. A node is 
external to a tree if it is not one of its elements but its direct ancestor is; we write 
dx := {u G V : u G x, u ^ x} for the set of external nodes of x. Finally, 

(3) a(x,u) := jj={v Ex: u < v} 

is the size of the subtree of x rooted at u (or the number of descendants of u in x, 
including u). 

Let B denote the (countable) set of finite binary trees, B„ := {x E B : j^x = n] 
those of size (number of nodes) n. The single element of Bi is {0}, the tree that 
consists of the root node only. 

2.2. Search algorithms and Markov chains. Let (U)i^ be a sequence of pair- 
wise distinct real numbers. The BST (binary search tree) algorithm stores these 
sequentially into labelled binary trees (x n ,L n ), n E N, with x n G B„ and L n : 
x n — > {ti, . . • ,t n }- For n = 1 we have x\ = {0} and Li(0) = t\. Given (i„,L„), we 
construct (x„ + i, L n+ i) as follows: Starting at the root node we compare the next 
input value t n+ \ to the value L n (u) attached to the node u under consideration and 
move to uO if < L n (u) and to ul otherwise, until an 'empty' node u (necessar- 
ily an external node of x n ) is found. Then x n+ i := x n U {u} and L„ +1 (it) := t n +i, 
L n +\{v) ■= L n {v) for all v E x n . 

Now let (?7i)i e N be a sequence of independent random variables with C{iji) = 
unif(0, 1) for all i e N and let X n be the random binary tree associated with the 
first n of these. By construction, the label functions L n are monotone with respect 
to the /3-order of the tree nodes, i.e. with /3 as in ((2]), 



In particular, if we number the external nodes of X n from the left to the right, 
then the number of the node that receives r\ n +\ is the rank of this value among 
{771, . . . , rj n }, hence uniformly distributed on {1, . . . , n + 1}. This shows that the 
(deterministic) BST algorithm, when applied to the (random) input (r?i)i 6 N, results 
in a Markov chain (X„)„ £ n with state space B, start at X\ = {0}, and transition 
probabilities 




(2) 



i=i 



(4) 



f3{u) < (3(v) L n (u) < L n (v), for all n with {u,v} C X, t 



(5) 



Q{x, x U {u}) 



( 



otherwise. 



1/(1 + ifuGdz, 
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In words: We obtain X n +i by choosing one of the n + 1 external nodes of X n 
uniformly at random and joining it to the tree. We refer to this construction as the 
BST chain. 

For the DST (digital search tree) algorithm the input values are infinite 0-1 
sequences, i.e. elements of dV. Given ii, ta, • • . £ <9V we again obtain a sequence 
x\,X2t ■ ■ of labelled binary trees, but now we use the components t n +\,k, ieN, 
of the next input value t n +i as a routing instruction through x n , moving to uO 
from an occupied node u £ if i n +i,fc+i = and to u\ otherwise. As in the 
BST case we assume that the ti's are the values of a sequence of independent and 
identically distributed random variables rji, where the distribution of the t^'s is now 
a probability measure /j, on the measurable space (ffV, B(dV)), with B(dV) the a- 
field generated by the projections on the sequence elements, dV 3 t = (tk)keN i— > U, 
iGN. This c-field is also generated by the sets 

(6) A u := {v edY: v>u}, u e V. 

It is easy to check that the intersection of two such sets is either empty or again of 
this form. This implies that /i is completely specified by its values n(A u ), u E V, 
and the DST analogue of then is 



By the DST chain with driving distribution /j, we mean a Markov chain (X n ) n ^ 
with state space B, start at {0} and transition mechanism given by (J7J. 

2.3. Doob-Martin compactification. We refer the reader to Doob's seminal pa- 
per |Doo59] and to the recent textbook |Woe09j for the main results of, background 
on, and further references for the boundary theory for transient Markov chains. 
For the BST chain the Doob-Martin compactification has recently been obtained 
in |EGW12] : It can be described as the closure B of the embedding of B into the 
compact space [0, l] v , endowed with pointwise convergence, that is given by the 
standardized subtree size functional, 



with a as defined in <j3j> - Further, the elements of the boundary 9B may be rep- 
resented by probability measures \i on (<9V, B(dV)), with convergence x n — > fx of a 
sequence (x n ) n eN in B meaning that 



and fi n (A u ) — > fi(A u ) for all u G V if we have a sequence (n n )nGN of elements of 
dM instead. 

The general theory implies that X n converges almost surely to a limit X^ with 
values in dM; EGW12 also contains a description of £(Xoo). The proof given 
there docs not make use of the algorithmic background but takes the transition 
mechanism ([5]) as its starting point. We now show that this background leads to 
a direct proof of X n — > X^, and to a representation of X^ in terms of the input 
sequence. 





fj,(A u ) = lim 



a(x n ,u) 



for all u G V, 
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We need some more notation. On V we define a metric dy by 

(8) d v (u,v) := 2-l uA "l - i(2-M +2"H), 

On V itself this gives the discrete topology, and the completion of V with respect 
to dy leads to V := V U ffV, a compact and separable metric space. This is also 
the ends compactification if we regard V as the complete rooted binary tree. We 
extend the A u 's to V by 

A u := {v e V : v > u}, u e V. 

Because of 

A u :={veY: d v {u,v)<2-^}^{veY: dy(u, v) < 2-^- 1 } 
these sets are open and closed. Further, 

A u , if u < v, 

{u} = A u \ (A u0 u A ul ), A u nA v = I A v , Hu>v, 

0, otherwise, 

hence {A u : u £ V} is a 7r-system that generates 23(V). Together these facts imply 
that weak convergence of probability measures fi n to a probability measure /j on 
(V,B(V)) is equivalent to 

(9) lim n n {A u ) — n(A u ) for all u e V. 

n— >oo 

In view of 

-a(X n ,u) = umi(X n )(A u ) 
n 

and XoofV) = convergence in the Doob-Martin topology is therefore equivalent 
to the weak convergence of probability measures on the metric space (V, dy) , if we 
represent finite subsets M of V by the uniform distribution unif(A/) on (V, £>(V)). 

Moreover, any sequence (fJ. n )nen of probability measures on (V, B(V)) is tight, as 
V is compact, and therefore has a limit point by Prohorov's theorem |Bil681 p. 37]. 
If (l^n(A u ))neN is a convergent sequence for each ugV, then there is only one such 
limit point, which means that fi n converges weakly to some probability measure /i 
and that © holds. Finally, let 

(10) t(u) := m£{n E N : X n 9 u}, u G V, 

be the time that the node u becomes an element of the BST sequence. It is easy 
to see that the t(u)'s are finite with probability 1. 

Theorem 1. Let (X n ) n ^ be the sequence of binary trees generated by the BST 
algorithm with input a sequence (r?i)igN of independent and identically distributed 
random variables with = unif(0, 1). 

(a) With probability 1 the sequence unif(W„) converges weakly to a random proba- 
bility measure W^ on {dV,B{dN)) as n — > oo. 

(b) For each u € V, u^fi, with i := t(u) — I, t as in (|10p . and 

=: V(i:0) < V(i:l) < ■ ■ < < V(i:i+1) ■= 1 

the augmented order statistics associated with r}\, . . . , r/,-, we have 



(i 



RUDOLF GRUBEL 



(c) The random variables 

are independent, and £(£ u ) — unif(0, 1) for all a£V. 

Proof. Let u, t(u), i and T]u : j), j = 0, . . . , i + 1, be as in part (b) of the theorem. 
The order property Q of the labeled binary search trees implies that for a node 
v with label rjk, k > i, the relation v > u is equivalent to T)u.j\ < r]k < 
Hence, by the law of large numbers, 

lim unif(X n )(A u ) = lim *{v e X n : v > u} 

n— >oo n— »oo ft 

j. #{i<k<n: Tfc £ + 
n^-oo 77, 
= + - V(i:j) 

with probability 1 for every u G V. In view of 

M = {t)£V: d(u, u) < 2" 1 " 1 - 1 } for all u e V 

the one-point sets with elements from V are open in the topology on V. As unif(X„) 
assigns at most the value 1/n to such a set it follows with the portmanteau theo- 
rem |Bil68[ p. 11] that any limit point of this sequence is concentrated on dV. Parts 
(a) and (b) of the theorem now follow with the above general remarks on weak 
convergence of probability measures on (V, £>(V)). 

For the proof of (c) we use the following well-known fact: The conditional dis- 
tribution of ?7i+i, given r]\,...,r)i and given that the value lands in an interval 
I = (V(i:j)i V(i:j+x)) of the augmented order statistics, is the uniform distribution on 
/, which implies that unif(0, 1) is the distribution of the normalized distance £.„ to 
the left endpoint of /. For different ^-values these relative insertion positions are 
independent, hence £„, u G V, arc independent and uniformly distributed on the 
unit interval. □ 

We note the following consequence of the representation in part (c) of the theo- 
rem: For a fixed u G ¥ let 

= u(0) < u(l) < ■ • ■ < u(k) = u 

with \u(j)\ = j for j = 0, . . . , k be the path that connects u to the root node. We 
then have 

iu(j) if u(j + 1) = u(j)0, 



(ii) Xoo(A u ) = with Lu) ■ 

3=0 



l -£uU) if + = u U) 1 - 



Note that the factors £ u (j)i j = 0, • • • , k — 1, arc independent and that they all have 
distribution unif(0, 1). 

Theorem Q] confirms the view expressed in |Woe091 pp.191 and 218] that in spe- 
cific cases embeddings (or boundaries) can generally be obtained directly on using 
the then available additional structure; here this turns out to be the algorithmic 
representation of the Markov chain. However, there are two additional benefits 
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of the general theory: First, because of the space-time property {T]) the limit Xoo 
generates the tail c-field 

oo 

T := f] a{{X m : m > n}) 

71 = 1 

associated with the sequence (X n ) ne n- This may serve as a starting point for 
the unification of strong limit theorems for functionals (l^OneM, Y n = "$>(X n ) of the 
discrete structures: If Y n converges to in a 'reasonable' space, then the limit Yqo, 
which is T- measurable, must be a function of Xoo (see e.g. |Kal971 Lemma 1.13]). 
The second general result is extremely useful in the context of the calculations 
that arise in specific applications of the theory: The conditional distribution of 
the chain (X„)„ e pj given the value of X^ is again a Markov chain, where the new 
transition probabilities can be obtained from the limit value and the old transition 
probabilities by a procedure that is known as Doob's /i-transform. In the present 
situation it turns out that the conditional distribution of the BST chain, given 
Xoo = M; i s the same as that of the DST chain driven by /i. We refer the reader 
to [EGW12] for details; the last statement appears there only for a specific /i, 
but the generalization to an arbitrary probability measure [i in the boundary is 
straightforward. Roughly, the embedded jump chains at the individual nodes are 
Polya urns, for these the boundary has been obtained in |BK64j . and from the 
general construction of the Doob-Martin boundary it is clear that the outcome is 
unaffected by the step from a Markov chain to its embedded jump chain. We collect 
some consequences in the following proposition, where 

(12) J'n : = cr (X\ , . . . , X n ) , ii £ N, 
are the elements of the natural filtration of the BST chain. 
Proposition 2. With the notation and assumptions as in Theorem^ 

(13) C(a(X n , uO) \o-(X n , u) = k, £ u = p) = Bin(/s - if k > 0, 
and, for all i,j £ No, 

(14) Cma(X n ,uO)=i,a(X n ,ul)=j) = Beta(i + 1, j + 1). 
Further, the variables (£ u ) u gv are conditionally independent given T n . 

3. Metric aspects 

All trees in this paper are subgraphs of the complete binary tree, which has V 
as its set of nodes and {(u, u) : u ^ 0} as its set of edges; in particular, our trees 
are specified by their node sets x. In a tree metric d the distance of any two nodes 
u, v is the sum of the distances between successive nodes on the unique path from 
u to v, which means that such a metric is given by its values d(u, u), u 6 x, u ^ 0. 
For example, the metric dy in Section [231 has dv(", u) = 2 - '"! -1 , and the canonical 
tree distance d can is given by d CSLn (u,u) = 1. For our trees the prefix order further 
leads to 

(15) d(u, v) = d(u, 0) + d(v, 0) — 2d(u A v, 0) for all u, v e x. 

Metric trees may also be interpreted as graphs with edge weight, where the edge 
(u, u) receives the weight d(u, u). 

Our aim in this section is to rephrase the convergence of the BST sequence as a 
convergence of metric trees, and to show that this view leads to convergence with 
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respect to stronger topologies. The situation here is much simpler than for Aldous' 
continuum random tree where the Gromov-Hausdorff convergence of equivalence 
classes of metric trees is used; see |Eva08] and the references given there. In fact, 
the search trees considered here have node sets that grow monotonically to the full 
V, so we may define convergence of a sequence ((x n , d n )) ne n of metric binary trees 
to (V, doo) to mean that 

(16) lim d n (u,v) = doo(u,v) for all u, v G V, 

n— >oc 

which of course is equivalent to lim„_ i . 00 d n (u, u) = c?oo(u,u) for all u G V, u 0. 
Note that dy and d can are both local metrics in the sense that d(u,v) does not 
depend on the tree x as long as u,v G x. 

Motivated by the view in Section 12.31 of finite and infinite binary trees as proba- 
bility measures \i on (V, B(V)) we now introduce the (relative) subtree size metric, 
which assigns (a(A u ) to the distance of u and u, i.e. 

o (x, u) 

d x (u, u) = ' rt for all u G x, u ^ 0, 
cr(x, 0) 

if x G 1, and 

g? m (u, u) = fi(A u ) for all u G V, u ^ 0, 

for the complete tree and a probability measure p on (dV, B(dV)), where we assume 
that p(A u ) > for all u G V. Again, there is an algorithmic motivation: In terms 
of the BST mechanism the weight of an edge (u, u) is the (relative) number of times 
this edge has been traversed in the construction of the tree. These metrics depend 
on their tree in a global manner. 

With this terminology in place we may now rephrase the convergence in Theo- 
rem [1] as the convergence in the sense of (fTB|) of the finite metric trees (X n ,dx n ) 
to the infinite metric tree (V, dx^), almost surely and as n — > oo. 

By construction the Doob-Martin compactification is the weakest topology that 
allows for a continuous extension of the functions B3i4 a(x,u)/a(x,$), u G V. 
For the analysis of tree functionals stronger modes of convergence turn out to be 
useful; for example, do we have uniform convergence in (|16[1 ? Also, subtree sizes 
decrease along paths leading away from the root node, so we may consider a weight 
factor for the distance of a node to its parent that depends on the depth of the node: 
For all p > 1, we define the weighted subtree size metric with weight parameter p 

by 

d x , P (u,u) := p^d x (u,u), d^ p {u,u) := p^d^UjU), 

in the finite and infinite case respectively. Of course, with p = 1 the subtree size 
metric reappears. 

Theorem 3. Let po = 1.26107- • • be the smaller of the two roots of the equation 
2elog(p) = p, p > 0. Let X n , n G N, and Xoo be as in Theorem^ 

(a) For p < po, the metric space (V, dx^.p) is compact with probability 1. 

(b) For p > po, the metric space (V, dx^^) has infinite diameter with probability 1. 

(c) For p < po, the metric spaces (X n ,dx n , P ) converge uniformly to (V, dx^^) as 
n — > oo in the sense of 

(17) sup \dx n ,p(u,v) — dxocpiu, u)| — ► almost surely and in mean. 

u,vex n 
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(d) For p > po, and with dx n , P {u, u) := for u £ X n , 

sup \dx n , P {u, v) — dxocpiu, v) | = oo with probability 1. 

u,v£V 

Proof. We embed the metric trees into the linear space L(0) of all functions / : 
V\{0}^Mvia 

x M> / := (u h4 d x (u, u)), x G B; 

probability measures p on (V, B(V)) become elements of L(0) by identifying with 
the function w i-4 In particular, we now write Xoq(u) instead of X^A^). 

For p > 1 let L(p) be the set of all / G L(0) with 

OO 

\\f\\p '■= y2p k max i/( u )i < °°- 

Clearly, this gives a family of nested separable Banach spaces, with 

B -4 L( 7 ) G L(p) G L(0) for 1 < p < 7. 
We now show that, with the above identification, 

(18) £||^oc||p < 00 if p < po, 

(19) pfsupp^Xooiu) = 00) =1 ifp>p , 

v uev y 

and that, for p < po and as n -4 00, 

(20) || X n — XooWp — > almost surely and in mean. 

Clearly, (fT5|) implies that X^ G L(p) with probability 1 if p < pq. 

The basis for our proof of (TIgj) and (TTi?|) is the connection of BST trees to branch- 
ing random walks, a connection that has previously been used by several authors, 
especially for the analysis of the height of search trees; see the survey |Dev98j and 
the references given there. Let u(k,j), j = 1, . . . , 2 fc , be a numbering of the nodes 
from Vfc such that 

0(u(k,l)) < f3(u(k,2)) < ■■■ < (3(u(k,2 k j), 

with (3 as defined in The key observation is that the variables 

Y k>j := -logXMkJ)), j = l,...,2 k , 

are the positions of the members of the fcth generation in a branching random walk 
with offspring distribution 82 and with 

Z := + <5-io g (i-e), £(£) = unif(0, 1), 

for the point process of the positions of the children relative to their parent. Big- 
gins |Big77| obtained several general results for such processes that we now spe- 
cialize to the present offspring distribution and point process of relative positions. 
Let 

m {6):=Ei K j e~ st Z(dt)) = 

and 

(21) rh(a) ■= inf {e Ba m{0) : 6 > 0} = 2ae 1 ~ a . 
Note that 

(22) m(a) = m(9(aj) with 6(a) = - - 1. 

a 
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and that, by definition of po, 

(23) p<p m(logp) < 1. 

Finally, let (£) be the number of particles in generation k that are located to 
the left of t. 

Now suppose that p < p . Let a := (p + pa)/2 and 77 :— log(a). We adapt the 
upper bound argument in |Big77| to our present needs: For all 9 > and C > 1, 
with 7 := log(C), 

Pi a k maxloofti) > C ) = Pi min Y k ,• < kri - 7 ) 

V | u |=fc / Vl<j<2 fc / 



< exp(k(r)- -j-^ej m{9) 1 
= C-° (e" e m(0))\ 



By (f2"3")l , 771(77) < 1- Choosing the optimal 8 = 6(r)), which with (|2"2"|) is easily seen 
to be greater than 1, leads to 

POO 

E(a k maxX 00 (w)) < 1+ P(a k max X^tj) > x) dx 

\u\=k Ji \u\=k 

pOO 

< l + rh(ri) k / x~ e{rl) dx < c, 



with a finite constant c that does not depend on k. Hence 

k 

P 



fc=i 1 1 fe=i v 



which in turn implies (|18[) by monotone convergence. 

Suppose now that p > p , so that 771(77) > 1 by (|23|) for 77 := log p. By |Big77| 
Theorem 2], 

lim j log(#{l < j < 2 fe : F fcJ < ^77}) = log 777(77) > 

k— s-oo fc 

with probability 1. In particular, and again with probability 1, 
3 k V k > k n 3 u G Vfe : — log (u) < fc log p. 



Clearly, this implies (|T9|) . 

For the proof of (|20|) we first consider the random variables cr(X n , u), n G N, for 
some fixed it G V. We wish to relate these to ElX^u^Fn], with T n as in (fT2]) . 
For this, we use the representation of in terms of (£«)«ev given in Section [2.31 
together with Proposition [2j We may assume that k := \u\ > 0. 

The representation (jlip , the conditional independence of the £- variables given 
J- n , and the well known formula for the first moment of beta distributions together 
lead to 



_ o a(X n ,u(j)0) + a(X n ,u(j)l) + 2 
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In view of 



cr(x, uO) + a(x, ul) + 1 



<j(x, u) if u £ x, 
1 if u ^ x, 



the product telescopes to 



(24) 



£7[Xoo(u)|.F n ] 



a(X n , u) + 1 
n+1 



for all u e X, 



We now introduce 

Z„:V^K, uh+^XooHI-Fn]. 

Then (Z ra , J 7 ra )neN is a vector-valued martingale. For p < po we have by part (a) 
of the theorem that <E L(p) with probability 1 and that i^llXoollp < oo, hence 
Z n — > Xqo almost surely and in mean in L(p) by Proposition V-2-6 in |Nev75] . 
In our present representation of trees as functions on V we have 



which implies that < X n < (1 + n~ 1 )Z n for all n £ N. As X n — > 1^ pointwise 
with probability 1 by Thcorcm[T]we can now use a suitable version of the dominated 
convergence theorem, such as given in |Kal97l Theorem 1.21], to obtain that X n 
converges to X^, in L(p) as n — > oo, again almost surely and in mean. 

It remains to show that the tree statements in the theorem follow from the linear 
space statements ([T8]) . ([I9]l . and (f20| . 

For (a) we prove that the limiting metric space is totally bounded. From (fl~8|) 
and the definition of the norm we obtain for any given e > a k — fc(e) £ N such 
that 



which by the definition of the weighted subtree size metric means that all nodes v 
with \v\ > k have a distance from their predecessor at level k that is less than e. As 
there are only finitely many nodes of level less than k this shows that the whole of 
V may be covered by a finite number of e-balls. Of course, this argument is meant 
to be applied to each element of a suitable set of probability 1 separately. 
For (b) we simply note that implies that, with probability 1, 




oo 




sup d Xx , p (u,u) 



= oc 



if p > po. This also gives (d). 
Finally, for all u £ V, u ^ 0, 



M 




The upper bound does not depend on u, hence (c) follows on using (|15p . 



□ 
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We note that the convergence of metric trees considered in Theorem[3]implies the 
convergence with respect to the Gromov-Hausdorff distance of the corresponding 
equivalence classes of metric trees; see [BBIOH Section 7.3.3]. 

The subtree size metric also leads to a visualization of search trees: We use 
the function j3 defined in ([2j to map nodes to points in the unit interval, and 
above the ^-coordinate fi{u) we draw a line parallel to the y-axis from dx n (u, 0) 
to dx„(u, 0). In order to obtain a visually more pleasing result we may add lines 
that run parallel to the a;-axis, connecting nodes with the same parent. In Figure[T] 
we have carried this out for the trees arising from two separate input sequences for 
the BST algorithm, with the data obtained from alternating blocks of length 10 of 
digits in the decimal expansion of tt — 3. The left part refers to the odd, the right 
to the even numbered blocks. In both cases we have given the trees for n = 50 
and n = 100, and with p = 1. Vertically, the trees are from the same distribution; 
moving horizontally to the right, we have almost sure convergence. 



In this section we show how the above results can be used in connection with 
the asymptotic analysis of tree functionals. Here is the recipe: We start with a 
functional Y n = *b n (X n ) of the trees, with (deterministic) functions ^„ on B ra that 
have values in some separable Banach space (L, || • ||). We suspect that Y n converges 
almost surely to some limit variable Yx as n — > oo. We know that if this is the 
case then Y^ = 'I'oo(^oo) for some 'too defined on <9B (as always, almost surely). 
We don't know what "J^ is, but if we manage to rewrite the ^ n 's in terms of 
subtree sizes, then Theorem Q] will lead to an educated guess. On that basis we 
next consider $ n (X„) = E[^(X 00 )\J 7 n ], assuming that £'||*(X 00 )|| < 00. This 
gives an L- valued martingale. By the associated convergence theorem we then have 
that Y n := <§> n (X n ) converges to Y^ almost surely and in mean. Finally, a simple 
inspection of $„ — ^ n may reveal that Y n — Y n is asymptotically negligible — indeed, 
if Y n converges to Y^, then Y n — Y n must tend to 0. 

Throughout this section we abbreviate X ao (A u ) to Xoo(u), 

4.1. Path length. The first tree functional we consider is the internal path length, 



4. Tree functionals 



(25) 




which may be rewritten as 



(26) 




Let 

n 1 

H(0) := 0, H(n) := V - for all nel, 



be the harmonic numbers. It is well known that 




where 7 ~ 0.57722 is Euler's constant. We need two auxiliary statements; we omit 
the (easy) proofs. 
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Figure 1. The metric tree for the odd (upper part) and even 
(lower part) 7r-data, for n = 50 (left) and n = 100 (right) respec- 
tively; see text for details. 
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Lemma 4. For all i,j £ No, 



r(i + j + 2) 



x\l -x) j \og(x)dx = H(i)-H(i + j + l). 



(27) 



r(i + i)r(j + 1) J 

For a random variable r\ with distribution Beta(i + + 1) Lemma 2] leads to 

i + 1 



£%log(??)) 



(ir(i + i)-fr(< + j + 2)). 



* + i + 2 

The next lemma is a summation by parts formula for binary trees. 
Lemma 5. For any function ip : V — > K, 

^(V'(u) - V>(«0) - V(m1)) = V>(0) - X! ^ or a ^ x G B ' 

uEx u£dx 

Major parts of the following theorem are known; we will give details later in 
order to be able to refer to the proof for a comparison of the methods used. Let 
(X n ) n( zjq be the BST chain and let X^ be its limit, as in Theorem [3] 

Theorem 6. Let C : (0, 1) -> R be defined by 

C(s) := 1 + 2 (s log(s) + (1 - s) log(l - s)) . 

(a) The limit 



Y x :=J2^oo{u)C 



Xeo{u) 



exists almost surely and in quadratic mean. 
(b) As n — > oo, 

1 



(28) 



IPL(X n ) - 2 log n 



27 



Y„ 



almost surely and in quadratic mean. 



Proof. From the representation of Xoo given in Section r2.3l we know that the random 
variables 

X^juO) 

are independent and uniformly distributed on the unit interval, and that ^(u) is 
a function of the £ v 's with v < u. In particular, for all nodes u, the two factors in 
the sum appearing in the definition of are independent. Let Q k be the cr-field 
generated by the £ u 's with \u\ < k and put 

'x^iuoy 



uev, |u|<fc 
Then these properties lead to 



Xoo(u) 



E[Y k+1 \G k ] = Y k + E 



u\ = k+l 

Y k + Y, Xoo(u)EC($ u ) 

\u\=k+l 

Y k , 



,fc€N. 



XoojuO) 
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where we have used the fact that EC(£ 
E[(Y k+1 -Y k ) 2 | g k ] = E 



0. Further, with the same arguments, 
2 

|u|=fc+l 



so that 



= J2 X°o{uf EC(Z U )\ 

|u|=fe+l 

E{Y k+1 -Y k f = J2 EX oo{u) 2 EC{Uf 
|«|=fe+i 



We also have k := EC(^ U ) 2 < oo, and using (jlip we get 



so that 
(29) 



EX^u) 2 = (££ 2 ) fe - 3- fc , 



E(Y k+1 - Y k f = 2 k i' k K for all keE. 



Taken together these calculations show that {Yk,Qk)keH is an £ 2 -bounded martin- 
gale, and an appeal to the corresponding martingale limit theorem completes the 
proof of (a). In particular, Yx, is well defined and even has finite second moment. 

For the proof of (b) let Z n :— S[Y 00 |J 7 n ], n £ N, so that (Z n , J : n )ne'H is again a 
martingale bounded in L 2 . Our plan is to show that Z n is sufficiently close to the 
transformed internal path length that appears in (|28|) , 

Using again the stochastic structure of Xoo we are thus led to consider the 
conditional expectations E[X OQ (u)\J-' n \ and E[C(£ u )\J- n ], beV and neN. From 
Proposition^ we know that, for all u £ X n , 

£(UFn) = Bcta(a(X n ,uO) + l,a(X n ,ul) + l)), 

and that the £ u 's are conditionally independent given J- n . Hence Lemma [4] can be 



applied, see also (|27|) . resulting in 



(30) 



E[C^ u )\F n 



2T(X n ,uO) + 2T(X n ,ul) 
cr{X n ,uO) +<r(X n ,ul) + 2 
- 2 H(a{X n , «0) + a(X„, ul) + 2) , 

where the function r : B x V —> R is given by 

r(x, u) := (cr(x, u) + l) H(a(x ) u) + l). 

For each fixed n G N, almost sure convergence of -EfY/clJ^] to -E[Y 00 |.F T i] as fc — >• oo 
follows from 

||#[Y fc |j;]-£[Y 00 |j n ]|| 2 < lln-yooillj, 

the upper bound in (|2T)|) . and the Borel-Cantelli lemma. Together with the condi- 
tional independence of X 00 {u) and C(£ u ) given T n this leads to 



(31) 



Z n = Y, E [ X oo( U )\^n]E[C(( u )\T n }. 



From (|30|) we obtain £'[C(£ u )|.7 : " ra ] = for u ^ X„, and, clearly, 

(32) a(X n ,uO) + a(X n ,ul) + l = <j(X n ,u) for all u £ X„. 
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Taken together, (|2"I ]) . ([50 ]) , ([51 ]) and ([52" ]) lead to 

^ q(A n , U ) + l ^ 2r(X n , M 0) + 2r(X rt , M l) , \ 

= £ n+l 1 1 + ^0 + 1 2ifHX n) .) + l)l, 

which in turn gives 

Z n = — !— (IPL(X„) + 2n) V (r(X n ,u) - r(X„,uO) - T(X n ,ul)). 

n+l n+l z — ' 

uex n 

Lemma [5] can be applied to the second sum, and the assertion finally follows from 
r(X n , 0) = 2(n + l)H(n + 1) and r(X„, u) = 1 for u e <9X„. □ 

Almost sure convergence of the standardized internal path length for the BST 
sequence has been obtained in Reg89], and convergence in distribution, together 
with a fixed point relation for the limit distribution, in |Ros91j . Our method 
may been seen as an amalgamation of Regnier's martingale approach and Rosler's 
approach, where the latter has come to be known as the contraction method in 
the analysis of algorithms: We obtain a strong limit, but we do not need to 'find 
the martingale' (a task familiar to many an applied probabilist). The approach 
suggested in the present paper, to look at convergence of the full objects via a 
suitable completion of the state space of the underlying combinatorial Markov chain, 
leads to a representation of the almost sure limit. This gives the martingale by 
projection via conditional expectations, and from the representation one can also 
read off a fixed point relation for the distribution of the limit. 

4.2. The Wiener index. The canonical graph distance d can (u, v) of any two nodes 
u and v in a finite connected graph G with node set V is the minimum length of a 
path (sequence of edges) that connects u and v in G. The sum of these distances 
is the Wiener index of the graph, 

(33) WI(G):=i Y, d - n (u,z;), 

(u,v)eVxv 

introduced by the chemist H. Wiener. Some background together with pointers to 
the literature is given in |Nei02] , which is also our main reference in this subsection. 
Among other results it is shown in |Nei02j that for the BST sequence (A„)„ g n the 
rescaled Wiener indices, 

W n :=i:WL(X n )-2logn, 

converge in distribution as n — > oo. 

Again, we project a suitable functional ^(Aqo) of the limit tree Xoo to a function 
E[^(X 00 )\J r n ] of X n that is sufficiently close to W n . This will give a strong limit 
theorem, i.e. it turns out that the rescaled Wiener indices in fact converge almost 
surely for the random binary trees generated by the BST algorithm for i.i.d. input, 
and it will also lead to a representation of the limit Woo as a function of X^. 

We begin by rewriting the Wiener index in terms of subtree sizes, similar to the 
transition from ([25]) to ([26]) in the analysis of the internal path length. For a binary 
tree x, 

(34) Y \ uAv \ = E ct (^) 2 - 
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This may be proved by induction, using the left and right subtrees in the induction 
step; see |Den091 p. 70]. Using ([T5J), ([H, ([33]) and (03) we now obtain 

(35) Wl(X n ) = nIPL(X n ) + n 2 - ^(^ n ,u) 2 . 

uex n 

It is a benefit of working with almost sure convergence that we can deal with the 
constituents on the right hand side of (|35p separately (which means that we can 
make use of Theorem [6]), whereas in connection with convergence in distribution 
one needs to consider the joint distribution of IPL(X n ) and WI(X n ); see |Nei02j . 

Theorem 7. The series 

(36) Z x :=Y,Xoo(u) 2 

converges almost surely and in quadratic mean, and, as n — > oo, 

(37) \wi(X n )-2 log n ^ Woo 

again almost surely and in quadratic mean, where the limit is given by 

(38) W oo :=2-/-3 + Y oo -Z 0O , 
with Yoo as in Theorem^ 

Proof. Almost sure convergence in (|36| follows with Theorem [3j and the moment 
calculations below show that EZ 2 ^ < oo. In particular, 

Z n := E[Zoo\J- n ] — > Zoo 

almost surely and in quadratic mean. Again, the Markov property implies that 
Z n can be written as a function of X n . In order to obtain this function we first 
consider a fixed node u G V. 
From we get 



fc-1 

Mi) ■ 



From (114[) and the known formula for the second moment of beta distributions we 
obtain, considering the cases u(j + 1) = u(j)0 and u(j + 1) = u(j)l separately, 

M/?2 , F1 = (a(A„, u(j + 1)) + l)(a(X n , u(j + 1)) + 2) 

LS «W 1 " J (a(X n ,u(j)0) + a{X n ,u(j)l) + 2)(a(X n , u(j)0) + a(X n , u(j)l) + 3) ' 

Using the conditional independence statement in Proposition[5]we see that we have 
a telescoping product again, so that 

prv fo^i-ri W n ,u) + l)(<r(X n ,u) + 2) c y M/1 y 

fi AooM K» = 7 — r- tt? — rrr; for all u e X„ U dA n . 

(n + l)(n + 2) 

The set V \ X n can be written as the disjoint union of the subtrees rooted at the 
n + 1 external nodes of X n , and we have 

E[£\F n ] = E[(l - eu) 2 |^"n] = I for all u { X n . 

Therefore, 

£ , [A 00 (?i) 2 |J r n ] 
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= (n + l)(n + 2) £ (*(*»,«) + !) (*(*»,«) + 2) £ 



oo , 



(n + l)(n + 2, 
6 



A: 



77 + 2 

in view of a(X ni u) = for u £ <9X„. Taken together this gives 

Z n = - 1 £ ((7(X n ,u) + l)(<7(X n ,u)+2) + 

(n + l)(n + 2) '—^ n + 2 

Using PB]) we get 

(a(X n ,u) + l)(a(X n ,u) + 2) = < X n,uf + 3 • IPL(X„) + 5n 

so that, with (|35|) . 

-— — — -WI(X„) = IPL(X W ) + — - Z„ + i? n , 

(n + l)(n + 2) (n + l)(n + 2j (n + l)(n + 2j 

where i?„ tends to almost surely and in quadratic mean. From Theorem [6] we 
know that 

- IPL(X n ) - 2 log n -»> 2 7 - 4 + 

71 

in the same sense. Combining the last two statements we obtain (|37|) . with Woo as 
in (EH). □ 



4.3. Metric silhouette. In our last application we consider an infinite-dimensional 
tree functional. 

Each element v — (ufc)fceN of <9V defines a path through a binary tree via the 
sequence (v(k))ke® 01 nodes given by v(k) = (i>i, . . . , Vk), k £ N. In |Grii09j the 
'silhouette' Sil(x) of x £ B was introduced in an attempt to obtain a search tree 
analogue of the famous Harris encoding of simply generated trees: With each path 
v, we record its exit level when passing through x, i.e. 

Sil(as)(tj) := min{£; 6 N : v(k) <£x}, v £ dV. 

The tree silhouette can be visualized as a function on the unit interval via the 
binary expansion 

(39) $ : [0, 1) -> 9V, iH (v k ) keN with v k := \2 k +H] - 2\2 k t\. 

It was shown in |Grii09j that for the BST chain (X„)„ e N some smoothing is neces- 
sary to obtain an interesting limit for the stochastic processes (Sil(X n ) ($(*))) 0<1 
as ri — > oo. 

We have seen in the previous sections that for search trees it makes sense to 
replace the canonical tree distance implicit in the above definition of Sil(x) by 
the subtree size metric. A corresponding variant of the silhouette is the metric 
silhouette, 



mSil(a;)(v) := ^a(x,v(k)), v £ dV. 



fc=i 
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Again, our aim is to obtain a strong limit theorem in the BST situation, together 
with a representation of the limit as a function of X^. In addition, and going 
beyond the individual arguments v G <9V, we regard mSil(X„) as a random function 
on dY. With dy as in ([8]) this is a compact and separable metric space (V is open 
in the completion V that we introduced in Section [23]). We write C(dY, dy) for the 
space of continuous functions / : dY —> R. Together with 

ll/lloo := sup |/(«)| 

this is a separable Banach space. 

Remember that the values of Xoo are probability measures on (dY, B(dY)). Let 
Sqo : dY — > [0, oo] be defined by 



Soo(^) ■= log 2 (dy(u,v))X 00 (du), vedY. 
Jav 

This is the logarithmic potential of the random measure X^ with respect to dy; 
see |Woe00[ p. 62]. Finally, we recall that a real function / on the metric space 
(dY, dy) is said to be (globally) Holder continuous with exponent a if there exists 
a constant C < oo such that 

|/(u)-/(«)| < Cdy(u,v) a for all u, v G dY. 

Theorem 8. Let cto :— log 2 po = 0.33464 . . . with po as in Theorem^ 

(a) £?||£oo||oo < oo. 

(b) With probability 1, is Holder continuous with exponent a for all a < a®. 

(c) As n — > oo, 
1 



— > almost surely and in mean. 



Proof. Because of dy(u, v) = 2 \ u/xv \ for all u, v € dY we have — log 2 dy(u, v) G No 
and 

— log 2 dy(u, v) > k <^==> u G A v (u\ 

for all k G N so that 

oo 

(40) Eoo(w) = x oo(v(k)) for all v G dY. 

k=l 

Now let a be as in the statement of the theorem; we may assume that a > 0. 
Let p := 2 Q . By Theorem [3] there exists a set of probability 1 such that for all 
lu in this set, C(u>) := HXo^cj)!^ < oo. We fix such an to and drop it from 
the notation. Because of X^u) < Cp' u \ for all u G V and (|40|l we then have 
^oo(v) < CY^kLi P k f° r all v G dY, which implies 

^oolloo — 7 r^oo p* 

In particular, i?||I!oo||oo < oo by (| 18|) in the proof of Theorem 03 
Similarly, if u, v € <9V are such that |m A w| = fc then 

oo oo 

|Eoo(«)-Eoo(u)| = ^ ^oo(«(7')) + E *oc(«0')) 
j=k+l 3=k+l 
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oo 



< 2c p~ j = < -^dvW 

by definition of p. This proves (b). 

For the proof of (c) we first consider the random functions £„ defined by 

En(») := (»)|^ B ], «eav. 

With J :— §\\{X n )(v) we get, using monotone convergence for conditional expecta- 
tions and J^-measurability of J, 

J oo 

s n («) = ^ly^))^] + ^ ely^c?))^] 
fc=i fc=j+i 

+ ! , a(X„,w(J)) + l ^ /l xfe ; 



^ n + 1 n + 1 ^ V 2 

fc=l k=J+l 

' (mSil(X, l )(u) + J) ' 



n + 1 + l 

Here we have used our formula (pH)) for _E[E 00 (u)|7 r „] and its extension to nodes 
outside X n that can be obtained as in the proof of Theorem [JJ 

Let h(x) — max{|u| : ti e i} be the height of x G B. Taking the supremum over 
v e 9V we get 



mSh^X,, 



< h(X n ) + 1 
n + 1 



n + 1 

It is easy to show that the right hand side converges to with probability 1 
(see |Dev86j for techniques and results on the height), hence it remains to prove 
that E„ converges almost surely and in mean to Eqo in the separable Banach space 
(C(dV, dy), || • ||oo)- This, however, is again immediate from the vector-valued mar- 
tingale convergence theorem given in |Nev75[ p. 104]. □ 

Figure [5] shows the metric silhouette for the trees in Figure [1] Note that the 
continuity in Theorem [8] refers to the space (9V, dy); for example, (in)neN with 
t n = | + (— 1)"^ for all n e N is a Cauchy sequence with respect to euclidean 
distance, but its inverse under the function $ defined in (|39|) that we used for the 
illustration is not a Cauchy sequence in (dV,dy). Loosely speaking, the function (3 
'flattens' the node set V. 

5. Conclusion 

The approach towards strong asymptotics of dynamic data structures that we 
have developed in detail for binary search trees should be applicable in many related 
situations. The necessary modifications may be minor, such as for the discounted 
path length that appears in Gr u09) . or straightforward, as for the random recursive 
trees that are often treated in parallel with binary trees, see e.g. jN ci02j for the 
Wiener index, or they may be challenging, for example when we wish to amplify 
the weak convergence results for node depth profiles obtained in DJN08 for a 
wide class of trees to strong limit theorems as we have done for the Wiener index in 
Section l4~2l (as shown in [CDJHOi] . by finding a suitable martingale, these profiles 
do converge almost surely in the BST case). 

Of course, convergence in distribution and convergence along paths are rather 
different phenomena, see Figures [1] and [H It is interesting that for a given dynamical 
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Figure 2. The metric silhouette for the odd (left) and even (right) 
7r-data, with n = 50 (blue) and n — 100 (black). 



structure we may have a strong limit theorem (with non-trivial limit) for some 
aspects (functionals), but not for others; sec [DG10J for such results in connection 
with the subtree size profile of binary search trees. 
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